Efficient Data-Parallel Tree-Traversal for BlobTrees (revised)
نویسندگان
چکیده
The hierarchical implicit modelling paradigm, as exemplified by the BlobTree, makes it possible to support not only Boolean operations and affine transformations, but also various forms of blending and space warping. Typically the resulting solid is converted to a boundary representation, a triangle mesh approximation for rendering. These triangles are obtained by evaluating the corresponding implicit function (field) at the samples of a dense regular three-dimensional grid and by performing a local isosurface extraction at each voxel. The performance bottleneck of this rendering process lies in the cost of the tree traversal (which typically must be executed hundreds of millions of times) and in the cost of applying the inverses of the space transformations associated with some of the nodes of the tree to the grid samples. Tree pruning is commonly used to reduce the number of samples for which the field value must be computed. Here, we propose a complementary strategy which reduces the costs of tree both the traversal and of applying the inverses of the blending and warping transformations that are associated with each evaluation. Without blending or warping a BlobTree can be reduced to a CSG tree only containing Boolean nodes and and affine transformations, which can be reordered to increase memory coherence. Furthermore, the cumulative effects of the affine transformations can be precomputed via matrix multiplication. We propose extensions of these techniques from CSG trees to the fully general BlobTrees. These extensions are based on tree reordering, bottom-up traversal, and caching of the combined matrix for uninterrupted runs of affine transformations in the BlobTree. We show that these new techniques result in an order of magnitude performance improvement for rendering large BlobTrees on modern Single Program Multiple Data (SPMD) devices.
منابع مشابه
Parallel multi-dimensional range query processing with R-trees on GPU
The general purpose computing on graphics processing unit (GP-GPU) has emerged as a new cost effective parallel computing paradigm in high performance computing research that enables large amount of data to be processed in parallel. Large scale scientific data intensive applications have been playing an important role in modern high performance computing research. A common access pattern into s...
متن کاملData Dependence Analysis for the Parallelization of Numerical Tree Codes
Data dependence analysis for automatic parallelization of sequential tree codes is discussed. Hierarchical numerical algorithms often use tree data structures for unbalanced, adaptively and dynamically created trees. Moreover, such codes often do not follow a strict divide and conquer concept, but introduce some geometric neighborhood data dependence in addition to parent-children dependencies....
متن کاملMATHEMATICAL ENGINEERING TECHNICAL REPORTS A Practicable Framework for Tree Reductions under Distributed Memory Environments
Besides intensive research toward matrices or one dimensional arrays, another important data structure, namely trees, are calling for efficient parallel treatments. Parallel tree contractions are fundamental to realize efficient computation over the inherently imbalanced structures. However, we sometimes fail to practically benefit from the techniques under current computer architectures. This ...
متن کاملAn improved algorithm to reconstruct a binary tree from its inorder and postorder traversals
It is well-known that, given inorder traversal along with one of the preorder or postorder traversals of a binary tree, the tree can be determined uniquely. Several algorithms have been proposed to reconstruct a binary tree from its inorder and preorder traversals. There is one study to reconstruct a binary tree from its inorder and postorder traversals, and this algorithm takes running time of...
متن کاملTree Traversal Scheduling: A Global Scheduling Technique for VLIW/EPIC Processors
Global scheduling in a treegion framework has been proposed to exploit instruction level parallelism (ILP) at compile time. A treegion is a single-entry / multiple-exit global scheduling scope that consists of basic blocks with control-flow that forms a tree. Because a treegion scope is nonlinear (includes multiple paths) it is distinguished from linear scopes such as traces or superblocks. Tre...
متن کامل